Protein engineering

ABSTRACT

A method of protein engineering is provided wherein a searchable computer database is created comprising entries in the form of descriptions of a location and orientation in 3D space of side chains of the constituent amino acid residues of a framework protein. A query is created which corresponds to a description of a location and orientation in 3D space of respective side chains of amino acid residues of a sample protein. The location and orientation in 3D space of constituent side-chains is preferably described as a Cα Cβ vector. The query is used to search the database and thereby identify a hit which corresponds to a framework protein having structural similarity with said sample protein. Framework protein “hits” so identified may be suitable candidates for further modification. A particular advantage of the present invention is that a modified framework protein may display one or more desired characteristics, such as a function similar to or inhibitory of the sample protein.

FIELD OF THE INVENTION

THIS INVENTION relates to a method of identifying proteins suitable forprotein engineering. In particular, the present invention relates to acomputer database searching method of identifying proteins according toaspects of three-dimensional structure, and furthermore to themodification of proteins so identified to thereby possess one or moredesired characteristics. Although not limited thereto, this inventionrelates to engineered proteins such as cytokine mimetics.

BACKGROUND OF THE INVENTION

Proteins are central to life due to their crucial involvement in avariety of biological processes, such as enzyme catalysis of biochemicalreactions, control of nucleic acid transcription and replication,hormonal regulation, signal transduction cascades and antigenrecognition during immune responses.

In many cases, one or more structural regions of a protein areresponsible for a particular function, hereinafter referred to as“functional regions”. These regions may constitute the active site of aprotein enzyme, the nucleic acid binding domain of a transcriptionfactor, a region of a protein cytokine crucial to binding the specificreceptor for that cytokine, or antigen-binding regions of antigenreceptors.

A functional region of a protein usually comprises one or more aminoacids which are required for that particular function, that is, they areessential for that function.

In many cases, although these required amino acid residues aretopographically proximal to each other, they may be well separated withrespect to primary amino acid sequence, that is, they arenon-contiguous. In addition, where there is more than one functionalregion of a protein, these regions may also be topographically proximal,but well separated in terms of primary amino acid sequence. In somecases, however, where there is more than one functional region involvedin a particular function, these functional regions may also betopographically well separated. This is a particularly important pointwith regard to the functional regions of cytokines.

“Cytokine” as used herein includes and encompasses soluble proteinmolecules which have a cognate cell surface receptor, and which areinvolved in initiating, controlling and otherwise regulating a varietyof processes relevant to cell growth, death and differentiation.Cytokines are typically exemplified by interferons (e.g. IFN-γ),interleukins (for example IL-2, IL-4 and IL-6), growth anddifferentiation factors [e.g. granulocyte colony stimulating factor(G-CSF) and erythropoietin (EPO)] and others such as growth hormone(GH), prolactin, TGF-β, tumour necrosis factor (TNF) and insulin. Eachof these molecules is capable of binding a specific receptor and therebyeliciting a particular biological response or set of responses.

The fact that a particular function of a protein can be attributed toone or more functional regions of that protein has formed the basis forstrategies aimed at modifying a protein by adding or subtractingfunctional regions to modify the function of that protein.

In this regard, the design and engineering of cytokine mimetics hasbecome an area of major importance, as many cytokine-cytokine receptorinteractions are central to the regulation of a variety of biologicalprocesses. It is envisaged that new mimetics will therefore becomeimportant new therapeutic agents that either mimic or inhibit thebiological response to cytokine-cytokine receptor interactions.

A “mimetic” is a molecule which elicits a biological response eithersimilar to, or more powerful than, that of another molecule (an“agonist”), or inhibits the action of the other molecule (an“antagonist”). The other molecule may be a cytokine, for example.

With regard to designing and engineering mimetics based on cytokines, aproblem frequently encountered with many engineered mimetics has beenthat they exhibit short biological half-lives and hence minimalbioavailability and efficacy. In this regard, it has been proposed thatsmall cysteine-rich proteins might be useful as protein “scaffolds” as abasis for engineering mimetics, due to their stability (Vita et al.,1995, Proc. Natl. Acad. Sci. USA 92 6404). These small cysteine-richproteins comprise a disulfide-bonded core and exposed amino acid sidechains at the protein surface (Neilsen et al., 1996, J. Mol. Biol. 263297). However the full potential of these proteins has not been realizeddue to the fact that typical prior art strategies for proteinengineering have largely been limited to transferring or exchangingcontiguous groups of amino acids within individual secondary structuralelements, such as loops or helices or β-sheets and no design strategiesexist for selecting the most appropriate disulfide-rich candidiate.

Examples of such an approach would include: the exchange of secondarystructural regions between RNase and angiogenin, either to confer RNaseactivity on angiogenin (Harper et al., 1989, Biochemistry 28 1875) orangiogenic activity on RNase (Raines et al., 1995, J. Biol. Chem.27017180); the insertion of elastase inhibition activity into IL-1β bytransfer of the protease inhibitor loop of elastase to the IL-1βscaffold (Wolfson et al., 1993, Biochemistry 32 5327); the insertion ofa 10 amino acid calcium-binding loop of thermolysin into Bacillussubtilis neutral protease (Toma et al., 1991, Biochemistry 30 97); theinsertion of a β-sheet from a snake toxin to replace the β-sheet ofcharybdotoxin (Drakopolou et al., 1996, J. Biol. Chem. 271 11979); andthe incorporation of a β-sheet from carbonic anhydrase into the β-sheetof charybdotoxin (Pierret et al., 1995, J. Med. Chem. 35 2145).

Of growing importance in protein engineering has been the use ofcomputer based technology combined with the elucidation of the 3Dstructures of small molecules and macromolecules. 3D molecularstructures are being generated at an increasing rate, such as by X-Raycrystallography and NMR techniques. These 3D features can be stored ingenerally accessible, searchable databases, such as the BROOKHAVENdatabase.

For the purposes of this specification, a database will comprise acollection of “entries”, each entry corresponding to a representation ofan aspect of 3D structure of a framework protein: A framework protein issimply any protein for which a 3D structure exists, either byexperimental elucidation or by predictive means such as computermodelling. A framework protein is potentially useful as a scaffold whichcan be structurally modified for the purposes of imparting a particularfunction thereto.

A “query” refers herein to a representation of an aspect of 3D structureof a protein which exhibits a function of interest. The representationof 3D structure would be in a form suitable for searching a databasewith the intention of identifying a “hit”. A hit is an entry identifiedaccording to the particular query and the algorithm used to perform thesearch.

An important advance in database searching has been made by representing3D structures in terms of the relationship between atoms located in“distance space”, rather than “Cartesian space” (Jakes & Willett, 1986,J. Mol. Graphics 4 12; Ho & Marshall, 1993, J. Comp. Aided. Mol. Des. 73). A location in Cartesian space is defined by three coordinates (x, y,z) which each correspond to a position along three respective axes (X,Y, Z), each axis being oriented at right angles to the other two.

A location in distance space, however, is defined by distances betweenatoms, expressed in the form of a distance matrix, which details thedistance between atoms. Distance matrices are therefore coordinateindependent, and comparisons between distance matrices can be madewithout restriction to a particular frame of reference, such as isrequired using Cartesian coordinates.

It is important to emphasise that an arrangement of atoms and its mirrorimage are described by identical distance matrices. A root mean squared(RMS) difference can be used to alleviate this ambiguity.

With regard to the 3D structure of proteins, a simplification of proteinstructure can be provided by reducing a 3D structure to “Cα-Cβ vectors”as discussed in McKie et al., 1995, Peptides: Chemistry, Structure &Biology p 354-355. A Cα-Cβ vector occupies a location in 3D space, thelocation being defined by the orientation of the covalent bond betweenthe α carbon and β carbon atoms of an amino acid (Lauri & Bartlett,1994, J. Comp. Aid. Mol. Des. 8 51). It will be appreciated that each ofthe 20 naturally-occurring constituent amino acids of a protein (exceptglycine), possess a Cα-Cβ vector due to the covalent bond between the“central” α carbon and the β carbon of the constituent side chain.

For those proteins containing Gly in the database, it is possible tomutate this to Ala to generate the required Cα-Cβ vector for databasesearching.

The usefulness of Cα-Cβ vectors is that they provide a simplification of3D structure. Therefore, only the amino acid side-chains of a functionalregion of a protein need be represented by the Cα-Cβ vector map, therebyexcluding the substantial portion of the protein(s) not directlyinvolved in that particular function. For the purposes of databasesearching, Cα-Cβ vectors are ideal, as they constitute the basic 3Dstructural information needed.

After identification of Cα-Cβ vectors corresponding to a protein or afunctional region thereof, the parameters that characterize each vectormust be stored in a database in such a way that retrieval in response toa query can be made quickly. A number of options are available forsuitable representation of Cα-Cβ vectors, whether as a database entry oras a query:

-   (A) as a distance matrix;-   (B) as a dihedral angle (δ) formed between respective Cα-Cβ vectors;-   (C) as angles α₁ and α₂ formed between respective Cα-Cβ vectors.

A simple explanation of these representations is provided in Lauri &Bartlett, 1994, supra, which is hereinafter incorporated by reference.The key to successful database searching is speed and efficiency. Thus,computer search algorithms have been developed which use a strategywhereby the vast majority of entries in the database are eliminated in apreliminary screening step.

These algorithms are demanding of computer resources, and therefore asearch is normally effected in two stages:

-   (1) a screening search to eliminate entries that cannot possibly    constitute a hit; and-   (2) an atom-by-atom comparison of a query with each entry not    eliminated in (1), to identify one or more hits.

The search in (1) could screen entries based on geometric attributes ofthe query (Lesk, 1979, Commun. ACM 22 219) interatomic distances andatom types (Jakes & Willett, 1986, supra), aromaticity, hybridization,connectivity, charge, position of lone pair electrons, or centre of massof ring structures (Sheridan et al., 1989, Proc. Natl Acad. Sci. 868165). This screening process would eliminate entries that have nochance of meeting the 3D constraints of the query.

This strategy, although quick, requires that for an entry to register asa hit, it must comprise every specified query component. As the numberof query components increases, the number of near misses increases andthe likelihood of finding a hit decreases.

A more useful search strategy which assesses the relative merits of eachnear miss as well as each hit has recently been provided by the searchprogram FOUNDATION (Ho & Marshall, 1993, supra). FOUNDATION uses aclique-detection algorithm (various algorithms are reviewed and comparedin Brint & Willett, 1987, J. Mol. Graphics 5 49 and Brint & Willett,1987, Chem. Inf. Comput. Sci. 27 152) which searches a 3D database ofentries for a user-defined query consisting of the coordinates ofvarious atoms and/or bonds of a 3D structural feature. FOUNDATIONidentifies all possible entries that contain any combination of auser-specified minimum number of matching atoms and/or bonds as hits.

Despite the usefulness of 3D database searching as a means ofidentifying structurally related proteins, this approach has not beenwell utilized with respect to engineering proteins with a desiredfunction.

OBJECT OF THE INVENTION

The present inventors have recognized that 3D database searching isuseful for identifying proteins which have one or more desiredstructural features, such proteins being candidate “frameworks” for thesubsequent engineering of proteins with desired characteristics orfunctions. Furthermore, the present inventors have realized that proteinengineering is best achieved by modification of a framework protein toincorporate particular amino acid residues required for acharacteristic, property or function, rather than by incorporatingentire elements of secondary structure such as loops or helices. This isparticularly applicable when functionally important amino acids arescattered throughout a protein and are not confined to particularregions of primary or secondary structure.

It is therefore an object of the present invention to provide a novelmethod of protein engineering.

SUMMARY OF THE INVENTION

In one aspect, the present invention resides in a method of proteinengineering including the steps of:

-   (i) creating a computer database which includes a plurality of    entries, each said entry corresponding to a description of a    location and orientation in 3D space of side chains of amino acid    residues of a framework protein;-   (ii) creating a query corresponding to a description of a location    and orientation in 3D space of respective side chains of two or more    amino acid residues of a sample protein which are required for a    function of said sample protein; and-   (iii) searching said database with said query to thereby identify    one or more hits wherein at least one of said hits corresponds to a    respective said framework protein having structural similarity with    said sample protein.

Preferably, the framework protein(s) identified as a hit has greaterstability than said sample protein.

In one embodiment, the method includes the step of modifying an aminoacid sequence of said framework protein which corresponds to a hit, bysubstituting at least one amino acid residue thereof with at least oneamino acid residue of said sample protein to thereby create a modifiedframework protein having increased structural similarity to said sampleprotein.

Preferably, said at least one amino acid residue of said sample proteinrepresents at least a portion of a functional region of said sampleprotein.

More preferably, at least two of the amino acid residues of said sampleprotein which substitute amino acid residues of said framwork proteinare non-contiguous in primary sequence.

Preferably, the framework protein so modified has increased structuralsimilarity to said sample protein.

Advantageously, the modified framework protein is capable of exhibitinga function which is either similar to, or inhibitory of, a function ofsaid sample protein.

Preferably, the location and orientation of each amino acid side-chainof said framework protein and said sample protein is represented by aCα-Cβ vector.

Preferably, the framework protein is a small cysteine-rich proteincapable of internal disulfide bond formation. More preferably, saidsmall cysteine-rich protein comprises 70 amino acids or less, having2-11 disulfide bonds.

In one embodiment, said sample protein is a cytokine selected from thegroup consisting of GH, IL-4, IL-6 and G-CSF.

In a second aspect, the invention provides an engineered proteincomprising 70 amino acid residues or less of a framework protein and2-11 disulfide bonds of said framework protein, together with at leasttwo amino acids of another protein which are non-contiguous in primarysequence and which represent at least a portion of a functional regionof said another protein.

Preferably, the engineered protein has greater stability than saidanother protein.

More preferably, the engineered protein exhibits a function eithersimilar to, or inhibitory of, said another protein.

In one embodiment, said another protein is a cytokine selected from thegroup consisting of GH, IL-4, IL-6 and G-CSF.

In a particular embodiment, the engineered protein has an amino acidsequence selected from the group consisting of the amino acid sequencesof SCY01, SCY02, SCY03, ERP01, ERP02, ERP03 and VIBO1.

In yet another aspect, the present invention resides in a computerprogram for searching a protein structure database.

In one embodiment, the computer program is for searching a proteindatabase comprising a plurality of entries, each said entrycorresponding to a distance matrix representation of two or more Cα-Cβvectors, said program including the steps of:

-   (i) comparing a query with each said database entry, said query    corresponding to a distance matrix representation of two or more    Cα-Cβ vectors; and-   (ii) identifying hits by clique detection, wherein a hit is defined    according to a minumum number of Cα-Cβ vector matches between said    query and each said entry.

Throughout this specification and claims which follow, unless thecontext requires otherwise, “comprise”, “comprises” and “comprising” areused inclusively, so that a stated integer or integer group does notexclude other integers or integer groups.

It will also be appreciated that throughout this specification andclaims, scientific terms are to be given their usual scientific meaning,although certain terms are defined herein to assist interpretation bythe skilled person.

BRIEF DESCRIPTION OF THE FIGURES AND TABLES

Table 1: An example of a query file which defines the query Cα-Cβvectors, the tolerance for each query atom and the definition of asubset.

Table 2: Blood serum stability test results of a solution of SCY01.

Table 3: Enzyme stability test results of a solution of SCY01.

FIG. 1: Amino acid sequences of the hGH high affinity site antagonistframework scyllatoxin, the hGH antagonists SCY01, SCY02, SCY03 and theiralignment with the hGH sequence. Disulfide linkages are indicated bylines connecting cysteines.

FIG. 2: Amino acid sequences for the hGH agonist framework VIB, theengineered molecule VIB01 and the alignment with the hGH sequence.Disulfide linkages are indicated by lines connecting cysteines.

FIG. 3: Comparision of the hGH structure with hGH agonist molecule VIB01showing the very high degree of overlap of the alpha helices.

FIG. 4: Schematic overview of database searching strategy.

FIG. 5: Two-dimensional depiction of three different representations ofa pair of Cα-Cβ vectors: d=interatomic distance as used to constructdistance matrices; δ=dihedral angle; α₁ and α₂ angles.

FIG. 6: Circular dichroism spectra of SCY01 showing little change in thestructure on temperature changes or on the addition of helix stabilizingagent Trifluroethanol.

FIG. 7: Structure of the engineered SCY01 molecule shown in comparisionwith the native scyllatoxin molecule.

FIG. 8: Biological effect of SCY01 on BaF3 cell proliferation byinhibiting the growth response of the cells to 0.5 ng/mL hGH, but not to50 U/mL IL-3.

FIG. 9: Amino acid sequence for the low affinity site hGH anatagonistframework ZDC and the engineered hGH anatagonist ZDC05 and the alignedhGH sequence. Disulfide linkages are indicated by lines connectingcysteines.

FIG. 10: Circular dichroism spectra of VIB01.

FIG. 11: Amino acid sequences of the hGH agonist framework ERP, theengineered molecules ERP01, ERP02, ERP03 and their alignment with thehGH sequence. Disulfide linkages are indicated by lines connectingcysteines.

FIG. 12: Circular dichroism spectra of ERP03 showing little change inthe structure on temperature changes or on the addition of helixstabilizing agent Trifluroethanol.

FIG. 13: Comparison of secondary Hα shifts for ERP01 and ERP03 showingsubstantially identical structure and disulphide connectivities. Theshaded bars show the invarient residues of the native ERP molecule.-▪-=ERP03 δHA; -♦-=ERP δHA.

FIG. 14: Amino acid sequences of the CD4 frameworks PTA and SCY, theengineered molecules PTA CD4, and SCY CD4 and the alignment with the CD4sequence. Disulfide linkages are indicated by lines connectingcysteines.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

It will be appreciated that the present invention is predicated, atleast in part, on the present inventors' realization that in order toidentify framework proteins suitable for further modification by proteinengineering, it is advantageous to search databases according to theorientation in 3D space of constituent amino acid side-chains of theframework protein, with respect to constituent amino acid side-chains ofthe sample protein which is the subject of the query. Framework protein“hits” so identified suitably share similarity, such as in terms oftopography and chemistry, to the sample protein “query”, and as such maybe suitable candidates for further modification. A particular aspect ofthe present invention is that a modified framework protein may displayone or more desired characteristics, such as increased stability and insome cases a function similar to or inhibitory of the sample protein.

Referring to the method of the first-mentioned aspect, preferably, the3D location and orientation of each amino acid side-chain of saidframework protein is represented as a Cα-Cβ vector.

Preferably, each said entry corresponds to a description in the form ofa distance matrix representation of said Cα-Cβ vectors.

Alternatively, said Cα-Cβ vectors may be represented by dihedral anglesor α₁ and α₂ angles.

As used herein, “protein” and “polypeptide” are used interchangeablywith regard to amino acid polymers. A “peptide” is a protein which hasno more than fifty (50) amino acids.

As used herein, a “framework protein” is any protein which exhibits oneor more desired structural features which provide advantages whichinclude size, solubility and/or stability. “Stability” in this contextincludes resistance to degradation by proteolytic enzymes and/ortemperature variation and/or resistance to denaturation by chaotropicagents and/or denaturing detergents, changes in pH, pH extremes, and/orREDOX extremes and/or changes.

Preferably, the framework protein is a small cysteine-rich proteincapable of internal disulfide bond formation. More preferably, saidframework protein is a small cysteine-rich protein comprising 70 aminoacids or less, having 2-11 disulfide bonds.

The amino acids used for creating each said entry may include some orall of the constituent of amino acids of the framework protein.

As used herein, a “sample protein” is a protein which has one or morefunctional characteristics of interest which render it desirable for thepurposes of protein engineering.

Suitably, the sample protein may be an enzyme, nucleic acid-bindingprotein, cytokine, antigen, receptor, ion channel, chaperonin, or anyprotein with a function of interest.

In an embodiment, said sample protein is a cytokine selected from thegroup consisting of GH, IL-4, G-CSF, IL-6 and EPO.

Preferably, said function of said sample protein comprises binding aspecific receptor to thereby elicit a biological response. However, avariety of other functions are contemplated, such as catalysis, bindingcations (Zn⁺⁺, Ca⁺⁺, Mg⁺⁺), transporting ions (e.g. Cl⁻, K⁺, Na⁺),binding lipids, binding nucleic acids as a means of transcriptionalregulation or regulating DNA replication, assisting protein folding andtransport, and any other function carried out by proteins.

With regard to creating a query, it is preferred that the location andorientation in 3D space of each side-chain of said two or more aminoacid residues is simplified as a Cα-Cβ vector.

Preferably, said query corresponds to a description in the form of adistance matrix representation of Cα-Cβ vectors. However, otherrepresentations such as dihedral angles or α₁ and α₂ angles may also beapplicable.

Preferably, said computer program used for searching said database isthe VECTRIX program, as will be described in detail hereinafter. VECTRIXincorporates the FOUNDATION algorithm (Ho & Marshall, 1993, supra, whichis herein incorporated by reference). Program FOUNDATION searches 3Ddatabases of small organic molecules to identify structures that containany combination of a user-specified minimum number of matching elementsof a user-defined query. It achieves this by first using a distancematrix to define the topography of the query atoms, followed byscreening using various query constraints which define the chemicalnature of the structure. The topology of the atoms in the structure areagain represented using a distance matrix. Structural fragments in thedatabase, whose distance description matches those of the query areidentified using graph theory (Gibbons Algorithmic Graph Theory;Cambridge University Press: Cambridge, 1988).

In graph theory, a graph is a structure comprised of nodes (vertices)connected by edges. A graph is completely connected when all nodes areconnected to one another. A subgraph is any subset of a larger graph.The largest completely connected subgraph of any graph is called aclique. Thus, the query is a completely connected graph, as allinteratomic distances are determined in the distance matrix. The task isthen to search a structural database to find all cliques that contain atleast a user-defined number of matching nodes.

There are many clique-finding algorithms. Some of the well knownprocedures include those by Bonner, 1964, IBM J. Res. Develop., 8 22;Gerhards & Lindenberg, 1981, Computing 27 349 and Bron & Kerbosch, 1973,Commun. ACM 16 575. Computational chemists have adapted these algorithmsor implemented similar ideas to facilitate searching for 3D structureswithin databases (Kuntz et al., 1982, J. Mol. Biol. 161 269; DesJarlaiset al., 1988, J. Med. Chem. 31 722; DesJarlais et al., 1990, Proc. Natl.Acad. Sci. 87 6644; Crandell & Smith, 1983, J. Chem. Infr. Comput. Sci.23 186; Brint & Willett, 1987, J. Mol. Graphics 5 49-56; Kuhl et al.,1984, J. Comput. Chem. 5 24 and Smellie et al., 1991, J. Chem. Inf. Sci.31 386).

Computer Database Searching

VECTRIX

The present inventors have created a program “VECTRIX”, which is amodified version of the clique-detection algorithm in program FOUNDATIONas described by Ho & Marshall, 1993, J. Comp. Aided. Mol. Des. 7 3-22.The search procedure is illustrated in Scheme A. The major changes incomparison to Ho & Marshall, 1993, supra include:

-   the query and database structures are both proteins;-   the query elements are a distance matrix defining the topography of    Cα-Cβ vectors, not individual atoms as in FOUNDATION;    similarly, the database structure is defined as a Cα-Cβ vector    distance-matrix and not every atom as in FOUNDATION;-   in FOUNDATION, a pair of atoms in a query is considered to match    with a pair of atoms in an entry in the database if the atom-type    and the distance between them are matched; in VECTRIX, a pair of    Cα-Cβ vectors in a query is considered to match with a pair of Cα-Cβ    vectors in an entry in the database if the four distances (Cα₁-Cα₂;    Cα₁-Cβ₂; Cβ₁-Cα₂; Cβ₁-Cβ₂) between the pairs are matched; and-   the FOUNDATION program performs the clique detection, steric    filtering and subset filtering together and outputs the hits that    satisfy the three criteria; by design, the VECTRIX program output    all hits that have number of matches greater than or equal to    MIN_MATCH. POSTVEC is then used to filter those hits based on steric    filtering, a new MIN_MATCH and subset consideration; by separating    the clique detection hits and the filtering process, the VECTRIX    program is more flexible.

An outline of a program written by the present inventors is shown inScheme A.

The VECTRIX program requires four parameters: (1) query.file (2)database.file; (3) steric.file and (4) MIN_MATCH. The parameters aredescribed in detail below.

(1 ) query.file

query.file (for example as in Table 1) contains the definition of thequery, the definition of tolerance for each query atom and thedefinition of SUBSET. The three definitions are described below:

Query definition: Prior to running the VECTRIX program, a particulartarget protein is selected. The target proteins three-dimensionalstructure must have been determined by experimental or theoretical meanswell known in the art. The functional amino acids of the target proteinmust be defined and the Cα-Cβ vectors for those functional residuesextracted to the query.file. Table 1 shows the definition of Cα-Cβvectors of four functional residues. The numbers in column 7, 8 and 9represent the x, y and z coordinates of the vectors respectively.

Tolerance Definition: The tolerance defines the allowable uncertainty inthe orientation of each atom. Note that the final tolerance of a vectorfrom atom A to atom B is the sum of the individual tolerance of atom Aand B. In Table 1, the tolerances for individual atoms are defined incolumn 10 to be 0.5 Å, so the tolerance for a distance between two atomsis 1.0 Å.

Subset Definition: A list of atoms can be grouped into a SUBSET. Thequery file allows for the definition of as many SUBSETs as are required.The SUBSET definition will be used in the POSTVEC program to filter thehits to obtain more relevant hits. In Table 1, the 1^(st) SUBSET commandis defined as subset 1 and it consists of Cα-Cβ vector numbers 1, 3 and4. The 2^(nd) SUBSET command is defined as subset 2 and it consists ofCα-Cβ vector number 2.

(2) database.file

database.file contains a list of file names that correspond with theentries constituting the database.

(3) steric.file

steric.file contains the coordinates of the grid points representing theligand or receptor space. There are two forms of steric filteringdepending on the availability of 3D structure of a receptor or ligand.If the structure of the receptor is known and a query is from the Cα-Cβvectors corresponding to the receptor-binding amino acid side chains ofa ligand, then a hit must be evaluated in terms of whether it wouldinvade the 3D space accessed by the receptor upon binding a cytokine,for example (receptor-based filtering). Moreover, if the structure ofthe ligand is known and a query is from the Cα-Cβ vectors correspondingto the receptor-binding amino acid side chains of a ligand, then a hitmust be evaluated in terms of whether it would invade the 3D space notoccupied by the ligand (ligand-based filtering). The mode is identifiedin the first line of the ‘steric file’. The first step in our stericfiltering algorithm is the calculation of the grid points that representthe ligand or receptor 3D space using the program PREPARE_STERIC_FILTER.The program first defines the limits of the structure via determiningthe maxima and minima in the x, y and z dimension. Then for each gridpoints (1 Å apart) within the limit, a xyz coordinate is output to a‘steric file’ if the point is in steric contact with the receptor or theligand.

(4) MIN_MATCH

MIN_MATCH is an integer defining the minimum number of Cα-Cβ vectorsthat match between the query and the entry in the database requiredbefore VECTRIX will consider a clique as a hit.

Having entered the appropriate parameters, the first general step of theVECTRIX program is to calculate the distance matrix of the Cα-Cβ vectorsof the query (see SCHEME A). Each database entry is now read in turn andthe Cα-Cβ distance matrix of the framework protein is calculated. Theclique detection algorithm of Ho & Marshall, 1993, supra, is used toidentify geometric matches between the query and the database entry. Ifno match is found, another database file is read and processed. If a Hitis found, then some further processing is required because cliquedetection algorithm only finds the entries with Cα-Cβ vectors that matchthose in the query. It does not check for steric integrity, that is, thestructural complementarity that each hit possesses with regard to the 3Dspace in which it must reside. The VECTRIX program uses the ‘stericfile’ to calculate the number of atoms in the hit which invade thereceptor space or the non-ligand space depending whether it is inreceptor-based or ligand-based filtering mode. Some parts of theframework protein are not essential to binding to the target protein viathe ‘matched’ functional residues. The non-essential part includes theside chains that are not in the matches, the N- or C-terminal residues,up to the matched residue or the first cysteine residue. The essentialatoms of a residue are the backbone atoms (N, H, CA, HA, C, O) and theside chain atoms that are attached to the CA atom (CB, 1HA and 2HA). Theessential residues are between the first and the last cysteine. If nocysteine is found in the protein, the essential residues are defined tobe between the first and the last matched residues. The VECTRIX programcounts and outputs the number of essential atoms as well as the numberof essential atoms that invade the receptor or non-ligand space.Furthermore, for each subset of vector defined in the query file, theVECTRIX program counts and outputs the number of matched vectors in thesubset. The results are written to an output file and another databaseentry is read and the process repeated until the end of the database isreached.

POSTVEC

By design, the VECTRIX program outputs all hits that have a number ofmatches greater than or equal to MIN_MATCH. The POSTVEC program iswritten for post VECTRIX filtering. The filtering is based on the stericcontact, a new number of matches and the count of match in each SUBSETdefined in the query.file. The POSTVEC program requires at least threeparameters, i.e.

postvec vectrix_out.file min_match max_invade_fraction <subset1_num><subset2_num> . . . <subsetX_num>

where:

-   -   the vectrix_out.file is the name of the vectrix output file.    -   Min_match represents the new minimum number of matches required.    -   the Max_invade fraction defines the maximum allowable fraction        of invasion of receptor/non ligand space. That is, hits are        rejected if the fraction of invasion is greater than the        max_invade_frac. e.g. 0.1 for 10%.    -   Subset1_num represent the number of matches required for        subset1.    -   Subset2_num represent the number of matches required for subset        2.    -   the bracket <> denote optional parameters. That is, Subset        parameters are optional, if they are not defined then there is        no subset filtering.        The output of POSTVEC are pdbfiles of the filtered hits. These        pdb files are in the same frame of reference as the query files,        enabling simple display and comparison.        Examination of Hits using Insight II

An Insight II macro, EXAMINE_HIT.BCL, was written to enable easy viewingof the hits obtained from POSTVEC. Before using EXAMINE_HIT.BCL, anInsightII .psv file, EXAMINE.PSV, must be created. This file containsthe ligand or the receptor in the same reference coordinate as the queryvectors. It is used as the background to display the hits. Normally theligand/receptor are set to dull colours and the query vectors arehighlighted with thick lines, Cα coloured red, and Cβ coloured yellow.In Insight II, sourcing the EXAMINE_HIT.BCL file will allow forvisualisation of the hits through the next and previous button, orthrough clicking on the filename of the hit. The hits are displayedtogether with the query and the receptor/ligand. Steric contacts andmatched vectors are highlighted.

An alternative representation of the VECTRIX program is shown in SchemeB.

Alternatively, other applicable clique detection algorithms are providedby Brint & Willett, 1987, J. Mol. Graphics, supra and Brint & Willett,1987, Chem. Inf. Comput. Sci, supra, which are hereinafter incorporatedby reference.

Using a series of automated scripts outlined in Scheme C, the databaseof small cysteine rich proteins is updated weekly by searching theBrookhaven database for suitable candidates.

Suitably, said one or more hits correspond to respective entriesidentified by said algorithm according to said query.

Should there be more than one hit, it is desirable to evaluate and rankeach hit. The most important factor in evaluating hits is “stericintegrity”, or the 3D structural complementarity of a hit when comparedto a query. Several algorithms have been developed which could beutilized for this purpose. Such algorithms would include an algorithmused by the FOUNDATION program, algorithms which check van der Waalsoverlap of each said hit with said query (Allinger et al., 1972, supra,which is herein incorporated by reference), or algorithms whichcalculate volume in common and volume of extra space with respect toeach said hit and said query (Marshall et al., 1979, supra, which isherein incorporated by reference).

It is also contemplated that other algorithms may be useful. Forexample, simple distance calculations between said hit and said queryafter superimposition thereof may be used to identify 3D spatialdifferences therebetween.

An outline of the process that is currently used for scoring is given inScheme D. These procedures post process output data from the POSTVECprogram, and these procedures may eventually be incorporated into theprogram to provide a semi-automated process. In the current filteringprocess, steps 1 and 2 evaluate the conformational stability of theengineered hit, and step 3 provides optimization of the fit between areceptor and hit. Note that this filtering process is described withreference to scoring hits in terms of their predicted interaction with areceptor eg. a cytokine and cytokine receptor. One skilled in the artwill realize that the principles outlined in Scheme D are applicable toany protein-protein interaction. For example, when a crystal structureis not known, scoring procedures can be implemented to ensure that thehit is subsumed by the steric surface of the ligand.

It is also envisaged that evaluation and ranking of each said hit may beachieved manually by a person skilled in the art, although this would bea less preferred method, particularly when there is a plurality of hitsto be evaluated and ranked.

In light of the foregoing, the skilled person will understand that themethod of the invention provides framework protein “hits” which may bethe subject of further modification.

As used herein in this context, a framework protein hit has “structuralsimilarity” to a sample protein by virtue of possessing amino acidsequence similarity, topographical similarity and/or chemicalsimilarity. For example, a framework protein “hit” has a surfacetopography and/or chemistry which is similar to that of areceptor-binding region of a cytokine. Substitution of framework proteinamino acids by sample protein amino acids preferably increases thedegree of similarity.

Preferably, a framework protein identified as a hit has greaterstability than the sample protein.

As used herein in this context, “stability” includes resistance todegradation by proteolytic enzymes and/or temperature variation and/orresistance to denaturation by chaotropic agents and/or denaturingdetergents, changes in pH, pH extremes, and/or REDOX extremes and/orchanges.

It will be appreciated that the said two or more amino acids used forcreating a query at step (iii) of the method of the invention constituteat least a portion of one or more functional regions of said sampleprotein. These amino acids may be the same as, or different to, said atleast one amino acid used in modifying the hit.

In one embodiment, an amino acid sequence of a framework protein whichcorresponds to a hit is modified by substituting at least one amino acidresidue thereof with at least one amino acid residue of said sampleprotein. Preferably, the said at least one amino acid of the sampleprotein is/are selected from those required for a function of saidsample protein. This engineering process can involve addition, deletionor insertion of amino acids as desired.

As already discussed, the purpose of such modification is to impart aparticular property, characteristic or function to a framework protein.The method of the invention takes account of the fact that the aminoacid residues essential to a particular function will often benon-contiguous with respect to primary sequence. These “scattered” aminoacid residues may nevertheless form at least a portion of one or morefunctional regions, each of which occupies a distinct location andorientation in 3D space.

Advantageously, modification of the framework protein hit will beperformed so as to effectively “transfer” one or more functionalregion(s) of the sample protein thereto. Transfer is achieved byincorporating amino acid residues from one or more functional regions(as hereinbefore defined) of the sample protein into an amino acidsequence of a framework protein. Such modification will be performed soas to engineer a protein which incorporates amino acid residues of saidone or more functional region(s) appropriately located and oriented in3D space.

In an embodiment, said framework protein is modified to function as acytokine mimetic. In this regard, modification of a framework proteinmay be performed so that said framework protein is capable of exhibitinga function similar to that of said sample protein (such as in the caseof an agonist), or alternatively, so that it inhibits a function of saidsample protein (such as in the case of an antagonist).

However, the scope of the present invention extends to engineeringproteins with any desired function by substituting amino acid residuesof a framework protein. For example, an enzyme might be engineered tocatalyze conversion of a substrate, or a transcription factor may beengineered to bind its cognate DNA sequence and to form complexes withother transcription factors necessary to promote transcription.

In the case where a cytyokine mimetic is to be engineered, a suitableapproach is to modify an amino acid sequence of a framework protein(corresponding to a hit) by substituting amino acid residue(s) thereofwith amino acid residue(s) of said cytokine selected from those aminoacid residues which are required for binding of said cytokine to aspecific receptor. Often, a biological response is elicited by acytokine binding to two or more receptor molecules, therebycross-linking said receptor molecules. A cytokine antagonist istherefore engineered by modifying a framework protein to include aminoacid residues of a functional region required for binding one receptormolecule but not the other; an agonist is engineered by including aminoacid residues of two functional regions, which together are required forbinding and cross-linking of two receptor molecules. The functionalregions required for binding said two receptor proteins occupy uniquelocations and orientations in 3D space. Engineering of an agonisttherefore requires that the relative 3D location and orientation of eachfunctional region is such that receptor binding and cross-linking isachievable.

In addition to direct substitution of amino acid residues of saidcytokine selected from those amino acid residues which are required forbinding of said cytokine to a specific receptor, several other designprocesses may be used. In cases where the atomic structure of the sampleprotein and its receptor are known, de novo design programs such asX-SITE (Laskowski et al., 1996, Journal of Molecular Biology, 175; Bohm,1992, J. Comput. Aided. Mol. Des. 6 69, which are herein incorporated byreference) may be used to guide engineering of auxilliary bindingepitopes into the hit that modulate activity. The auxilliary bindingepitopes may be natural or unnatural amino acids that may be conjugatedto additional functionality such as protecting groups used in syntheticpeptide chemistry.

Programs that measure electrostatic similarity of mutated frameworks andthe sample protein or electrostatic complementarity of the mutatedframework and the sample protein receptor, such as DelPhi (Honig &Nicholls A, 1987, ‘DelPhi’, Computer Program, Department of Biochemistryand Molecular Biophysics Columbia University, which is hereinincorporated by reference), may be employed to determine unmutated areasof the mutated framework that may be deleterious to activity.

Programs that measure buried surface areas, such as Naccess (Hubbard &Thornton, 1993, ‘NACCESS’, Computer Program, Department of Biochemistryand Molecular Biology, University College London, which is hereinincorporated by reference) may be used to analyse and compare the buriedsurface areas of the sample protein and the mutated framework.

Often regions in proteins may be disordered and absent from the X-Ray orNMR structure. When residues are absent in the binding region of thesample protein, techniques such as homology modelling and loop searchingmay be employed to construct a complete model of the atomic coordinates.

Whichever approach is taken, modification of said amino acid sequence ofsaid framework protein requires that considerations of maintainingstereochemical and secondary structural integrity apply. It is thereforeimportant to be able to predict any structural effects induced in saidframework protein by such modification. This can be accomplished withalgorithms well known to the art as described in Bowie et al., 1991,Science 253 164-170; Luthy et al., 1992, Nature 356 83-85 and Laskowskiet al., 1993, J. Appl. Cryst. 26 283-91.

Preferably, a modified framework protein would be chemicallysynthesized. Alternatively, this may be achieved by chemicallysynthesizing a polynucleotide sequence which encodes an amino acidsequence of said modified framework protein. Techniques applicable tothe chemical synthesis of proteins and nucleic acids are well known inthe art, and an example of such a technique will be providedhereinafter.

Alternatively, a polynucleotide sequence which encodes an amino acidsequence of a framework protein corresponding to said hit may bemodified by in vitro mutagenesis techniques, resulting in a modifiedpolynucleotide sequence encoding an amino acid sequence of said modifiedframework protein. Suitable in vitro mutagenesis techniques are wellknown in the art, such as described in Chapter 8 CURRENT PROTOCOLS INMOLECULAR BIOLOGY (Ausubel et al., Eds; John Wiley & Sons Inc., 1995),which is herein incorporated by reference. Phage display is alsocontemplated, which technique is well known in the art. An exemplaryphage display method is provided in Smith et al., 1998, J. Mol. Biol.277 317, which is herein incorporated by reference.

According to one embodiment of the invention, each said entry in thedatabase corresponds to a small cysteine-rich protein of not more than70 amino acid residues, initially represented in cartesian coordinateform, but subsequently processed into a distance matrix representationof Cα-Cβ vectors prior to searching. Said query is in the form of adistance matrix representation of Cα-Cβ vectors corresponding to aminoacid side-chains of said sample protein, said amino acid side-chainsbeing required for high-affinity binding of said sample protein to areceptor protein. In a particular embodiment, the sample protein isselected from group consisting of GH, IL-4, G-CSF and IL-6.

In the case where said sample protein is human Growth Hormone (hGH), andsaid receptor protein is human Growth Hormone Receptor (hGHR), the Cα-Cβvectors of hGH are a simplification of the 3D location and orientationof the amino acid side-chains of hGH which contact hGHR duringhigh-affinity binding, and are required for such binding.

In this case, said small cysteine-rich protein corresponding to a hit isscyllatoxin, the amino acid sequence of which (shown in FIG. 1) ismodified so that a protein produced with that amino acid sequence ispotentially capable of functioning as an hGH antagonist. The particularCα-Cβ vectors used in the search process were Asp A171; Lys A172; GluA174; Thr A175; Phe A176; Arg A178; Ile A179; Lys A41; Leu A45; Pro A48;Glu A56; Arg A64; and Gln A68. The particular amino acid residues of hGHincorporated into the amino acid sequence of scyllatoxin were selectedfrom those required for high-affinity binding of hGH to hGHR (as shownabove) and which topographically matched with residues of scyllatoxin.Determination of which amino acids of scyllatoxin could be substitutedwithout drastically affecting structural integrity was achieved with theassistance of the INSIGHT II modelling program.

The SCY01-SCY03 peptides, designed as potential hGH antagonists, werechemically synthesised with the respective amino acid sequences shown inFIG. 1.

In another case, said small cysteine-rich protein corresponding to a hitis a marine worm toxin (VIB). Said hit was identified by databasesearching using a query which comprised Cα-Cβ vectors of the followinghGH amino acid residues: Lys A41; Leu A45; Pro A48; Glu A56; Arg A64;Gln A68; Asp A171; Lys A 172; Glu A174; Thr A175; Phe A 176; Arg A178;Ile A179; Arg A8; Leu A9; Asn A12; Leu A15; Arg A16; His A18; Arg A19;Tyr A103; Asp A116; Leu A117; Glu A119; and Thr A123.

An amino acid sequence of said hit (VIB) is shown in FIG. 2, and anamino acid sequence of proteins engineered by modifying one or moreamino acids of said hit (VIB01) is shown in FIG. 2. The particular aminoacid residues of hGH used to modify said hit were selected from thoseforming the agonist-binding functional region of hGH as indicated inFIG. 2. Overlap between hGH and said marine worm toxin is shown in FIG.3, which serves to emphasize the ability of the method of the inventionto identify hits which match cytokine agonist functional regions.

The peptides designed according to the hGH agonist regions consitutecandidate hGH agonists.

In light of the foregoing, it will be understood that the presentinvention contemplates engineered proteins such as according to thesecond-mentioned aspect.

In one embodiment, the amino acids of said another protein present inthe engineered protein represent at least one functional region of saidanother protein.

In another embodiment, the amino acids of said another protein presentin the engineered protein represent two functional regions of saidanother protein.

As well as providing amino acids which are non-contiguous in primarysequence, said another protein may also provide amino acids which arecontiguous in primary sequence.

In one embodiment, the engineered protein has an amino acid sequenceselected from the group consisting of SCY01, SCY02, SCY03, ERP01, ERP02,ERP03 and VIBO1.

It will also be appreciated that according to both the first and secondaspects of the invention, homologs of engineered proteins arecontemplated. A person skilled in the art will realize that conservativeamino acid substitutions, deletions and additions can be made such thata protein will retain a particular function notwithstanding such changesin amino acid sequence. All such homologs fall within the scope of theinvention described herein.

In order that the present invention may be understood in more detail,the skilled person is directed to the following non-limiting examples.

EXAMPLES Example 1

Overview of Database Search Strategy

A schematic description of the computational approach developed by thepresent inventors, program VECTRIX, is shown in FIG. 4. The first stepinvolves the creation of a library of small cysteine-rich proteins.Currently, 344 such proteins (each with less than 70 amino acidresidues) comprising over 3779 experimentally-derived 3D structures havebeen extracted from the BROOKHAVEN database. However, it would also befeasible to construct databases using theoretically derived features,such as by homology modelling, threading or other techniques known inthe field.

Each structure is simplified, in turn, into Cα-Cβ vectors (step a),essentially resulting in a database of entries (step b). For thepurposes of searching the database, each query is in the form of adistance matrix representation of Cα-Cβ vectors (step c). However, it ispossible to represent Cα-Cβ vectors by other means, such as dihedralangles (δ) or α₁ and α₂ angles. A simple description of these types ofrepresentations with respect to a Cα-Cβ vector pair is shown in FIG. 5.

The search algorithm compares the distance matrix representing the queryCα-Cβ vectors with the distance matrix representing Cα-Cβ vectors ofeach entry (step d). Comparison of topographical similarities was chosenbecause Cα-Cβ vectors are common to all amino acid side chains (exceptglycine), and are essentially anchored to the backbone. They thereforerepresent the initial orientation of the amino acid side chain in 3Dspace, which would probably not undergo significant change uponinteraction with another protein. It is envisaged that the extra atomsof the side chain will provide some degree of induced fit during such aninteraction.

Alternative, more restricted approaches would use secondary structuralfeatures such as α-carbon backbone structures, together with suitablealgorithms well known in the field (Holm & Sander, 1994, supra;Alexandrov, 1996, supra; Alexandrov & Fisher, 1996, supra; and Oreng,1994, supra).

The intermolecular geometric relationship of Cα-Cβ vectors is comparedusing the clique-detection algorithm of Ho & Marshall, 1993, supra,which identifies hits according to a user-defined number of minimumvector components. However, other algorithms well known in the art wouldalso be useful in this regard.

As a result of step d, one or more hits may be identified. If a singlehit is obtained, no ranking is necessary. If the number of hits issmall, it may be possible for the skilled person to evaluate and rankeach hit individually (step e). If, however, the number of hits islarge, such manual comparison would be more difficult, and an automatedprocess is required.

The most important factor in evaluating and ranking hits is stericintegrity, that is, the structural complementarity that each hitpossesses with regard to the 3D space in which it must reside. Forexample, if the query is in the form of a distance matrix representationof Cα-Cβ vectors corresponding to the receptor-binding amino acidside-chains of a hormone, then a hit must be evaluated in terms ofwhether it would invade the 3D space accessed by the receptor uponbinding the cytokine. Several algorithms have been developed that areuseful for this purpose. For example, the FOUNDATION program of Ho &Marshall, 1993, supra uses various flood filling algorithms to definethe 3D space occupied by the receptor (as determined from the crystalstructure of the receptor), and then uses atom-checking routines toestablish whether the atoms of a hit reside in the binding “cavity” ofthe receptor. Other approaches include placing molecules in a cubecontaining lattice points and checking the van der Waals overlap of eachmolecule (Allinger, 1972, In: Pharmacology and the future of Man.Proceedings of the 5th International Conference on Pharmacology pp57-63). A related method involves the calculation of the volume incommon and the volume of extra space of two molecules (Marshall et al.,1979, The Conformational Parameter in Drug Design: The active analogapproach. 112 205).

It is also possible to use simple distance calculations between queryand hit, after the two have been superimposed, to identify if the hitprotrudes from the space occupied by the query structure. This is anapproach the present inventors have implemented in an algorithmcurrently being constructed.

It is also important to be able to predict any drastic structuraleffects that may result from amino acid sequence changes when modifyinga hit. This will, in part, be achieved by maximizing the degree of aminoacid sequence identity of the modified hit with that of the protein (orarea of the protein) to which the query corresponded. In addition, thestereochemical and degree of secondary structure disruption of themodified hit can be evaluated using standard algorithms which checkprotein stereochemistry on an amino acid by amino acid basis. Similarly,secondary structure prediction algorithms can be used to evaluate thepotential for an amino acid sequence modification of a hit to disruptsecondary structure.

Finally, the present inventors plan to utilize molecular surfaces tocompare various physicochemical properties of a query and hit. Charge,electrostatic potential, hydrophobicity, occupancy, and hydrogen bondingpotential have all been mapped to protein surfaces, providing detailedcomparisons between proteins. A method for quantitating the degree ofsimilarity between two molecular surfaces has been developed, in which agnomonic projection casts the calculated values of a given property ontoa spherical surface (Dasnzinger & Dean, 1985, J. Theor. Biol. 116 215).Two such surfaces can then be superimposed using pairs of correspondingatoms. This algorithm would be very useful for comparing query proteinwith a hit, to allow fine tuning of amino acid residues of the proteincorresponding to the hit, and to improve steric and electrochemicalcomplementarity.

Since the database searching algorithms (such as provided by the VECTRIXprogram) applicable to the method of the invention allow for theidentification of partial hits, there is scope for a skilled person touse molecular modelling to identify additional regions on the surface ofthe protein corresponding to the partial hit for mimicking vectorsmissed in the database search. This could involve the use of D-aminoacids or non-coded amino acids, for example, to achieve better mimicrywhen engineering a mimetic.

In the following examples, the VECTRIX program has been applied tovarious sample proteins.

Example 2

High Affinity hGH Antagonists

Growth hormone (GH) is a pituitary cytokine that regulates many growthprocesses, such as the growth and differentiation of muscle, bone andcartilage cells. The growth cytokine receptor (GHR) consists of threedomains:

-   (i) an extracellular domain that binds GH;-   (ii) a transmembrane domain; and-   (iii) a cytoplasmic domain involved in eliciting an intracellular    signal upon cytokine binding.

Intracellular signalling occurs as a result of dimerization of separateGHRs following sequential binding of each receptor to a single GHligand. The first GHR binds to the high affinity site of GH, while thesecond GHR subsequently binds to this complex. In support of this model,the crystal structure of this complex shows two identical receptormolecules bound to dissimilar sites on a single human GH molecule (hGH;De Vos et al., 1992, Science 255 306).

The high affinity site on hGH is concave and buries approximately 1200Å² of surface area, while the second binding site on hGH buriesapproximately 900 Å² of surface area. A third region contributing to thestability of the complex comprises an area of 500 Å² buried by thereceptor-receptor interaction.

The crystal structure also reveals that the actual contact areas of boththe high affinity and low affinity sites of hGH are buried uponcomplexation with the receptors.

In developing antagonists of hGH, the present inventors have sought todesign molecules that mimic the high-affinity binding of hGH. Mutagenicstudies of the amino acid residues within the high affinity binding siteshowed a dramatic decrease in affinity when certain of these amino acidresidues were converted to alanine (Cunningham & Wells, 1993, 234 554).In this regard, of the 31 amino acid residues with buried side-chains, amere eight (Lys A41; Lys A45; Pro A61; Arg A64; Lys A172; Thr A175; PheA176; and Arg A178) accounted for approximately 85% of the total changein binding energy resulting from substitution by alanine. A further fiveresidues (Pro A48; Glu A56; Gln A68; Asp A171; and Ile A179) essentiallyaccounted for the remainder of the binding energy.

The GH residues currently used in the design of antagonists are: AspA171; Lys A172; Glu A174; Thr A175; Phe A176; Arg A178; Ile A179; LysA41; Leu A45; Pro A48; Glu A56; Arg A64; and Gln A68. It is these aminoacid residues of hGH which formed the basis of the query for thepurposes of database searching.

Scyllatoxin (pdb1scy) was returned as a hit framework that matched amaximum of 7 vectors of the hGH high affinity surface. Afteridentification of a hit molecule, molecular modelling studies were usedto optimise the hit resulting in the design of SCY01, SCY02 and SCY03.

For example, molecular modelling studies (using INSIGHT II) suggestedthat the C-terminal His of the scyllatoxin-based mimetics could beremoved as it does not interact with the receptor. This has advantageswhen synthesising the target molecule as His have a potential toracemise during peptide assembly. As shown in FIG. 1, the mutatedframework SCY01 was produced by transfer of 7 matching hGH residues,R167, K168, D171, K172, E174, T175 and F176. Similarly SCY02 wasdesigned by transfer of hGH residues D171, K172, E174, T175, F176, R178and I179, however the affinity matured hGH mutation E174S wasincorporated into SCY02. Similarly, SCY03 incorporated the affinitymatured hGH mutations D171S and E174S. In this fashion, severalanalogues were designed based on a single hit, that incorporateddifferent functional residues and affinity matured residues.

In addition, molecular modelling techniques were used to optimise theamino acid functionality that was transferred to the new framework.Using the atomic structure of hGHR, X-SITE (Laskowski et al., 1996,supra) was used to predict binding sites for functional groups thatcould be incorporated into the hit peptide. Thus SCY13 was developedfrom SCY02 and SCY03 with the aid of the program X-SITE (Laskowski etal., 1996, supra), to incorporate novel mutations and auxilliary groups.As shown in FIG. 1, SCY13 possesses a D171Y mutation, a T175D mutationand an F176E(Fm) mutation. In addition, an N4R mutation in the nativescyllatoxin sequence was also incorporated based on the X-SITE(Laskowski et al., 1996, supra) results. These mutations wereincorporated to optimise the electrostatic interactions and to increasethe bound surface area of the modelled SCY-hGHR complex.

Molecular modelling studies indicated that SCY01, SCY02 and SCY03 wouldbury approximately 700 Å² when bound to hGHR, whilst SCY13 would buryapproximately 1000 Å² when bound to hGHR. The modelling program DelPhi(Honig, B. & Nicholls, A. (1987), ‘DelPhi’, Computer Program, Departmentof Biochemistry and Molecular Biophysics Columbia University) was usedto compare the electrostatic potential maps of hGH and SCY peptides,with the conclusion that there was good complementarity between hGH andSCY peptides.

The scyllatoxin peptides SCY01-SCY03 and SCY13 (FIG. 1) were thensynthesised using solid phase techniques (M. Schnolzer et al., 1992,International Journal of Peptide and Protein Research,. 40 180-193)purified and oxidised. The products were fully characterised using massspectrometry, high performance liquid chromatography (HPLC) and aminoacid analysis (AAA). The secondary structure elements of the engineeredSCY molecules were determined by circular dichroism on SCY01 and SCY02(FIG. 6). The spectra showed a high helical content consistent with thenative SCY fold. In addition, CD indicated that the helical structurewas unchanged by addition of helical stabilizing agents such as TFE ordestabilizing agents such as Guanidine.HCl or temperature. Thisemphasises the favourable chemical characteristics of these frameworks.

In order to determine that the new engineered SCY framework mimics thestructure of the region of GH used as a query, the structure of SCY01was determined by NMR spectroscopy. As illustrated in FIG. 7, we foundthat their is close conformational overlap (RMS 0.45 Å) between thefunctional residues on GH and the engineered surface of SCY01. Thusvalidating the process of selecting a target protein, simplifying thefunctional epitope into Cα-Cβ vectors, using these as a query toidentify new frameworks that match the shape of this query,synthesising, characterising and folding the new engineered framework.The resulting new engineered framework structurally matches that of thefunctional epitope of the target protein, thus validating the designprocess.

In order to characterise the folding patterns of SCY02 and SCY03 NMRexperiments were again carried out. However, this time the secondaryshifts were compared (Wishart et al., J. Biomolecular NMR 5 67) betweenthe engineered and native SCY. As expected there is little or nodeviation in the CHα or NHα shifts compared to the native SCY moleculeindicating the correct fold and disulphide bond connectivity.

SCY01 was tested for biological function by bioassay using the BaF3 cellline, which cells normally respond to GH. The results are shown in FIG.8. SCY01 was assayed at various concentrations to check its ability toinhibit BaF3 cell proliferation in response to either 0.5 ng/mL hGH, oras a control, 50 Units/mL IL-3. The calculated K_(i) from theseexperiments was approximately 200 μM, and no inhibitory activity wasobserved with respect to IL-3 induced proliferation. Thus, SCY01displayed an inhibitory activity with respect to GH-stimulatedproliferation. This biological effect suggests that SCY01 is a candidatefor further investigation with regard to it's mechanism of action.

The SCY peptides showed extremely good stability in the hGH assay bufferas judged by HPLC of the peptide at various time points after incubationin the assay buffer for up to 72 hrs. Preliminary studies evaluated thebioavailability of SCY01 by exposing it to a variety of proteases(trypsin, chymotrypsin and pepsin) and blood serum proteins as describedin MATERIALS AND METHODS. The results of the blood serum stability testare presented in Table 2, and the results of the enzyme stability testsare presented in Table 3. The SCY peptide was found to be stable after24 hrs in each case, while control peptides were rapidly digested. Thusemphasising the favourable chemical characteristics of disulfide-richproteins.

In this example the present inventors have taken a functional epitope ofhGH and successfully engineered it onto a new disulfide-rich framework.This framework has appealing chemical characteristics in terms ofbioavailability and bioactivity when compared to macromolecularproteins.

Experimental to Example 2

Vectrix Results

Number of vectors searched: 15—R167, K168, D171, K172, E174, T175, F176,R178, I179, K45, P48, E56, R64, Q68.

Number of Different Frameworks Selected (Name:pdb Code Number VectorMatches):

Scyllatoxin: pdb1scy (7)

Synthesis

As described in the General Materials and Methods section. The peptideswere fully characterized by mass spectrometry, Reverse Phase HighPerformance Liquid chromatography (RP-HPLC) and Amino acid analysis(AAA).

Folding

The pure reduced peptides SCY 01-03 were folded using 0.1M solution ofNH₄HCO₃ stirred overnight at RT at a peptide concentration of ˜0.3 μMper ml monitored by HPLC and mass spectrometry. The folded peptide wasisolated by preparative HPLC. The correct disulphide connectivity forSCY01 was determined by full structure analysis by NMR. Folding methodsusing oxidized and reduced glutathione in a ratio of 100:10:1 GSH:GSSG:peptide and published methods using 5 mM GSSG to 0.5 mM GSH in NaPO₄buffer pH 7.4 was carried out to give identically folded material. Afterfolding the pure peptide an equivalent yield of peptide was obtained byfolding the crude peptide in exactly the same manner. The oxidation ofSCY13 was complicated by the Fm group attached to the Glu. SCY13 wasoxidised using a 30% TFE solution in the presence of 5 mM GSSG to 0.5 mMGSH in NaPO₄ buffer pH 7.4.

Circular Dichroism (CD)

CD was performed as outlined in the General Materials and Methodssection.

NMR

The NMR structure of SCY01 and the CHα and NHα connectivities weredetermined as outlined in the General Materials and Methods section.

Peptide Stability Tests

Stability in Assay Buffer

The SCY peptides showed extremely good stability in the hGH assay buffer(RPMI-1640 medium supplemented with 10% (v/v) foetal bovine serum (FBS)and 100 units/mL IL3. The peptides were incubated at 1 mg/ml solutionsin the buffer at 37° C. Samples were removed at various time points andHPLC analysis showed the rate of peptide decomposition up to 72 hrs.

Blood Serum

Blood was collected in heparinised tubes by venapuncture. The blood wascentrifuged at 5000 rpm for 20 mins and the serum decanted. The bloodserum was stored at −20° C. A sample of the blood serum (900 μL) wasincubated with 100 μL of the stock peptide solution (I mg/mL in H₂O) at37° C. and aliquots (100 μL) removed at the required time. A solution of50% CH₃CN 0.1% TFA was added to precipitate the blood serum proteins andcentrifuged at 13000 rpm for 5 mins. A sample of this solution (100 μL)was analysed by RP-HPLC (Vydac C18 218TP54 250×4.1 mm id 1%/min gradientH₂O/CH₃CN 0.1% TFA) to detect peptide digestion.

Enzyme Stability Test

Trypsin

To the peptide solution (NH₄HCO₃, pH 8.3, 0.87 mg/mL) was added trypsin(5% w:v). Samples were incubated at 37° C. and aliquots removed at 0, 1,3 and 18 hrs and analysed by RP-HPLC as above.

Chymotrypsin

To the stock peptide solution (100 μL) was added 900 μL NH₄HCO₃ (pH8.3). Chymotrypsin was added to 5% w:v and incubated at 37° C. Aliquotswere removed at 0 hr. 1 hr and 24 hrs and analysed by RP HPLC.

Pepsin

To the stock peptide solution (100 μL) was added H₂O (800 μL) and 0.1 MHCl (100 μL) to pH 2.2. Pepsin was added to give a 1% w:v solution andincubated at 37° C. Aliquots were removed at 0 h, 1 h and 24 hrs andanalysed by RP-HPLC.

Example 3

Growth Hormone—Low Affinity Site

The low affinity site of growth hormone comprises at least 12 residues.The Cα-Cβ vectors of these 12 residues were used in a VECTRIX search.Pdb1zdc (ZDC) was returned as the best hit with 9 search vectors matchedat 1 Å tolerance. These residues were R8, L9, D11, N12, L15, R16, R19,D116 and E119. Molecular modelling (Insight II) was again used tooptimise the hit. It was decided that the R29L (matching L9 of hGH) maydisrupt the ZDC fold and this mutation was not incorporated.Furthermore, additional molecular modelling studies suggested that ZDCcould match a further 7 residues of hGH. The residues that matched (15residues—RMSd backbone atoms between hit and hGH—1.46 angstroms) andwere incorporated into ZDC05 were, R8, D11, N12, L15, R16, R19, Y111,D112, K115, D116, E118, E119, G120, Q122 and T123. As shown in FIG. 9,the mutated framework ZDC05 was produced by transfer of the above 15matching hGH residues.

Experimental to Example 3

Vectrix Results.

Number of vectors searched: 12—R8, L9, D11, N12, L15, R16, R19, D112,L113, D116, E119, T123.

Number of different matches at 7 or more vector matches: 22

Number of unique frameworks at 7 or more vector matches: 6

Number of Different Frameworks Selected (Name:pdb Code Number VectorMatches):

Protein A engineered fragment: pdb1zdc (9)

Example 4

Growth Hormone Agonist I

The agonist site of hGH comprises 25 residues. The Cα-Cβ vectors ofthese 25 residues were used in a VECTRIX search. Pdb1vib (VIB) wasreturned as the best hit with 8 search vectors matched. These residueswere N12, R16, R19, D171, K172, E174, T175 and F176. Molecular modellingdetermined that VIB could match a further 9 residues of hGH. Theresidues that matched (17 residues—RMSd backbone atoms between hit andhGH—0.86 angstroms) and incorporated into VIB01 were D11, N12, R16, R19,L20 H21, Q22, L23, F25, R167, K168, D169, D171, K172, E174, T175 andF176. As shown in FIG. 2, the mutated framework VIB01 was produced bytransfer of the above 17 matching hGH residues.

The modelling program Delphi (Honig & Nicholls, 1987, supra) was used tocompare the electrostatic potential maps of hGH and the mimics, with theconclusion that there was good complimentarity between hGH and themimics.

With the aid of molecular mechanics forcefield minimisations andmolecular dynamics, VIB01 was determined to position the mutatedresidues in appropriate spatial orientations to mimic hGH and to retainthe native fold.

The VIB peptide (FIG. 2) was synthesised using solid phase techniques(M. Schnolzer et al., International Journal of Peptide and ProteinResearch, supra), purified and oxidised. The product was fullycharacterised using mass spectrometry HPLC and AAA. The secondarystructure elements of the engineered VIB molecules was checked bycircular dichroism as illustrated in FIG. 10. The engineered VIB peptidehad a very stable structure and shows significant helical character inaqueous conditions. This would be expected as the native fold is a helixloop helix motif.

In addition, the VECTRIX search identified peptide ERP as a hit with 7search vectors matched. These residues were N12, L15, R16, H18, R19,T175 and R178. Molecular modelling determined that ERP could match afurther 6 residues of hGH. The residues that matched (13 residues—RMSdbackbone atoms between hit and hGH—1.33 angstroms) and were incorporatedinto ERP01 were R8, D11, N12, M14, L15, R16, H18, R19, E174, T175, F176,R178 and I179. As shown in FIG. 11, the mutated framework ERP01 wasproduced by transfer of the above 13 matching hGH residues.

The modelling program DelPhi (Honig & Nicholls, 1987, supra) was used tocompare the electrostatic potential maps of hGH and the mimics, with theconclusion that there was good complimentarity between hGH and themimics.

With the aid of molecular mechanics forcefield minimisations andmolecular dynamics, ERP01 was determined to position the mutatedresidues in appropriate spatial orientations to mimic hGH and to retainthe native fold.

ERP02 differed from ERP01 in containing the hGH affinity maturedmutations E174S, I179T and H18D. The G14F mutation (F176 mimic) in ERP01and ERP02 necessitated two major mutations, S6G and N11G. ERP03eliminated the G14F mutation and the necessity for these mutationsgiving a less perturbed sequence.

The ERP peptides 01-03 (FIG. 11) were synthesised using solid phasetechniques (M. Schnolzer et al., International Journal of Peptide andProtein Research, supra), purified and oxidised. The product was fullycharacterised using mass spectrometry HPLC and AAA. The secondarystructure elements of the engineered ERP molecules was checked bycircular dichroism on ERP03 (FIG. 12). This showed a very high degree ofalpha helical character in agreement with the 3 helical bundle structureof the native ERP molecule.

NMR of ERP01 and ERP03 was carried out to check that the 3 disulfidebonds have formed correctly. As expected there is only small deviationfrom the native ERP molecule where the mutations to mimic the hGHmolecule are made (FIG. 13 for ERP03). There is little or no deviationin the CHα or in the NHα shifts compared to the native ERP moleculeindicative of the correct folding and disulphide bond connectivity, onceagain emphasing the ability to engineer new surfaces onto disulfide richpeptides, whilst maintaining the native fold.

Experimental to Example 4: VIB

Vectrix Results

Number of vectors searched: 25—R8, L9, N12, L15, R16, H18, R19, K41,L45, P45, E56, R64, Q68, Y103, D116, L117, E119, T123, D171, K172, E174,T175, F176, R178 and I179.

Number of Different Matches

61292 at minimum 5 vector matches

Number of Unique Frameworks

10 at minimum 7 vectors, 1 at minimum 8 vectors

Number of Different Frameworks Selected (Name:pdb Code: # VectorMatches)

Marine worm neurotoxin: pdb1vib (8)

Peptide Synthesis

Synthesis of the VIB peptides was as described in the General Materialsand Methods section.

Oxidation of the VIB Peptides

The reduced VIB peptides were oxidsied using the methods outlined forthe ERP peptides with 30% TFE solutions and GSSG: GSH oxidation shuttle.

Circular Dichroism

CD was performed as outlined in the General Materials and Methodssection.

Experimental to Example 4: ERP Molecule

Synthesis of ERP Peptides

As described in the General Materials and Methods section.

Folding of ERP Peptides

The peptide was dissolved at a low concentration in cold water to whichwas added trifluoroethanol to 30%. This was cooled at 4° C. for twohours before oxidised and reduced glutathione was added(10:100:1/GSSG:GSH:peptide) then 1M NH₄HCO₃ was added to give a 0.1 Msolution at pH 8.1. The oxidised peptides were isolated by HPLC.

Circular Dichroism

CD was performed as outlined in the General Materials and Methodssection.

NMR of ERP01 and 03

The NMR structure of ERP01 and ERP03 and the Cα-Cβ and Cα-NHαconnectivities were determined as outlined in the General Materials andMethods section.

Example 5

Interleukin 4 (IL-4)

IL-4 is a four helix bundle cytokine that is the basis of the allergicresponse mechanisms in asthma, rhinitis, conjunctivitis and dermatitis.It plays an important role in the induction of immunoglobulins throughthe turning on of B-cells that produce Igm, IgE and IgG's. IL-4associates primarily with the IL-4 alpha receptor which accounts fornearly the complete binding affinity. The IL-4 receptor complex thenrecruits the common γ chain to form the cell signaling heterodimer.

The functional epitope of IL-4 that determines the binding affinity tothe receptor α chain has been identified through mutational analysis andfrom the crystal structure of the recently determined IL-4 and theIL-4Rα complex. (Hage et al., 1999, Cell 97 271) The key binding eventinvolves mainly charged residues from helix A and C of IL-4 particularlyArg88 and Glu9.

The 13 amino acid residues of the binding surface of IL-4 were used as aquery for program VECTRIX. In this case the database to be searchedcontained the structure of GCN4, a 31 residue leucine zipper peptide.The GCN4 molecule was identified by the program VECTRIX as a hit. Itmatched 8 vectors of IL-4 (RMS 0.39 Å). Upon engineering andsynthesising this molecule containing these 8 amino acids, an IL-4agonist is expected with a potency of Kd 106 μM (Dominques et al., 1999,Nat. Struct. Biol. 6 652)

An additional molecule ZDC was found that matches 10 vectors. Uponsynthesising the engineered framework it will be folded and assayed.

Vectrix Results

Number of vectors searched: 13: K77, R81, K84, R85, R88, N89, W91, T13,E9, I5, R53, F82, K12

Total Number of Different Matches at 7 or More

396

Number of Unique Frameworks

30

Number of Different Frameworks Selected (Name:pdb Code: # VectorMatches)

GCN4 peptide: pdb1zta (8)

Protein A fragment (engineered): pdb1zdc: (10)

N.B. No molecule selected in the search matched to Arg53.

Example 6

CD4 GP120

The CD4-GP120 interaction is the primary binding event that allows theHuman Immunodeficiency Virus (HIV) to enter a cell. The crystalstructure of CD4 has been known for some time (Wang et al., 1990, Nature348 411) but a structure of the CD4 and a highly modified GP120 complexwas only solved in June 1998 (Kwong et al., 1998, Nature 393 648). Ithas been known for some time through mutational analysis of CD4 (Fleuryet al., 1991, Cell 66 1037) that the key amino acids involved in bindingto GP120 reside on a loop (CDR1) involving the residues 41-47 and thekey binding residue Arg59.

The Cα-Cβ vectors of these residues were used in a VECTRIX search. Twomolecules SCY and PTA (FIG. 14) were identifed as potential matches.Both molecules were optimised using a design procedure as describedabove.

The biological activity of SCY is consistent with the studies of Vita etal., 1998, Biopolymer 47 93.

Experimental for Example 6

Vectrix Results

Number of vectors searched: 7: K35, S42, F43, R59, D63, Q40, L44.

Total Number of Different Matches

At 4 or more matches. 409

Number of Unique Frameworks

116

Number of Different Frameworks Selected (Name:pdb Code: # VectorMatches)

Scorpion neurotoxin: pdb2pta ( 5)

Scyllatoxin: pdb1scy: (4)

The scy molecule is only selected in the vectrix search if the absoluterequirement of a match with Arg59 is removed.

Synthesis of PTACD4 and SCYCD4 Molecules

As described in the General Materials and Methods section.

Oxidation of PTACD4 Molecule

The PTA peptide was oxidised by stirring the peptide overnight in 0.1MNH₄HCO₃ pH 8.1. The oxidised peptide (2 forms) was recovered by HPLC.Both folded forms were assayed separately. The oxidation of the peptidein different conditions in the presence of glutathione failed to yieldfolded peptide.

Oxidation of SCY Molecule

The SCY CD4 molecule was oxidised using 5 mM GSSG to 0.5 mM GSH in NaPO4buffer pH 7.4. The oxidised peptide was purified by HPLC.

Biacore Assay

GP120 bound to the Biacore chip through NHS coupling onto a CM-5 Biacorechip. CD4 is then passed over the GP120 surface and the degree ofbinding assessed through both the on rate K_(Association) and the offrate k_(Dissociation). CD4 is then equilibrated with the inhibitorligand and passaged over the GP120. Through the BiaCore module thedegree to which the PTA or SCY ligand disrupts the binding of CD4 to thechip is assessed.

Example 7

Interleukin 6 (IL-6)

Interleukin 6 (IL-6) is a cytokine that plays an important role in theinflamation cascade, neural development, bone metabolism, hematopoiesiscell proliferation and immune response mechanisms. Interleukin 6 is a 4helical bundle cytokine that binds to a IL-6 alpha receptor and to acommon receptor motif GP130. The IL-6R α subunit does not play a role inintracellular signalling. This is carried out through the liganddependent dimerisation of the associated GP130 receptor molecule. Thefull receptor complex is believed to be hexameric with two units each ofIL-6, IL-6R and GP130. The pleiotropic effects of IL-6 is thought tocome about because of this complex arrangement of the heterotrimericreceptor complex. The interaction sites for both the IL-6Rα and GP130receptors has been well studied through site specific mutagenisis ofboth the receptor molecules and the IL-6 molecule. The structure of IL-6in both solution and crystal forms has been solved and the crystalstructure of the GP130 receptor has recently been determined.

The IL-6α receptor binding site on IL-6 (termed Site I) is localisedprimarily to the end of helix D. Two additional sites Site II and IIIare responsible for the two different GP130 receptor molecules binding.The two GP130 binding sites are spread over a wide area at the oppositeend of the molecule to the IL-6 binding site.

The IL-6 VECTRIX search described herein pertains only to the II-6αreceptor interaction. It does not relate to the GP130 receptorinteraction or the multi receptor interactions (though the VECTRIXsearch has been carried out for these two sites II and III as well). Nomodeling of the IL-6 residues to any of the hit frameworks has beencarried out. A few examples of possible framework targets are listedbelow.

Vectrix Results

Number of vectors searched: 21 Subset1 (Site I) 8 vectors: Subset 2(Site II and III) 13 vectors.

Number of Different Matches at 8 and Above Matches for Site I

179

Number of Unique Frameworks

29

Number of Different Frameworks Selected (Name:pdb Code: # VectorMatches)

Protein A fragment (engineered): pdb1zdc: (9)

Moloney murine leukemia virus fragment: Pdb1mof: (10)

Scyllatoxin: pdb1scy: (8)

Example 8

G-CSF

Granulocyte Colony Stimulating Factor (G-CSF) is part of the class of 4helical bundle cytokine or growth factors. It is involved in thepromotion of cell proliferation and differentiation leading to theproduction of mature neutrophils. Its ability to replenish theseneutrophils in-vivo makes it an attractive drug target. G-CSF functionsthrough receptor dimerisation of the CSF receptor. There has beenalanine scanning mutagenisis carried out on G-CSF to identify the keyresidues involved in receptor recognition. The crystal structure ofG-CSF has been available since 1993 (Hill et al., 1993, Proceedings ofthe National Acadamy of Science USA 90 5167) and the NMR structure since1994 (Zink et al., 1994, Biochemistry 33 8453).

The VECTRIX search was done with an absolute requirement for a vectormatching the critical amino acid Phe 145. However, relatively few hitsresulted, presumably due to the restriction of every hit matching thePhe 145 vector. Alterations of this absolute requirement and refinementof the VECTRIX search will lead to a larger number of hits.

Experimental to Example 8

Vectrix Results

Number of vectors searched: 18

Number of Different Matches

338

Number of Unique Frameworks

115

Number of Different Frameworks Selected (Name:pdb Code: # VectorMatches)

Further refinement of the vectrix search is needed before a selection asto probable ligand frameworks.

GENERAL MATERIALS & METHODS

Design

Database searching and all design steps were carried out on either anR10000 or R12000 SGI Octane workstation. Database searching wasperformed with VECTRIX. Visualisation and peptide mutations andmodifications were performed using Software programs from Biosym/MSI ofSan Diego-InsightII and Biopolymer respectively. Analysis ofelectrostatic potential character of the molecules was carried out usingBiosym/MSI of San Diego-DelPhi, while surface area calculations wereperformed with Naccess (Hubbard & Thornton, 1993, ‘NACCESS’, ComputerProgram, Department of Biochemistry and Molecular Biology, UniversityCollege London) Molecular mechanics minimisations and molecular dynamicscalculations were performed on the mutated frameworks to determinewhether the native fold was retained. Programs such as X-SITE (Laskowskiet al., 1996 Journal of Molecular Biology, p175-201) were used to addadditional functionality to the mutated peptides.

Chemicals and Reagents

Trifluoroacetic acid (TFA) dichloromethane (DCM) dimethylformamide (DMF)and disopropylethylamine (DIEA) were from Auspep (Melbourne Australia).2-(1H-benzotriazol-1-yl)-1,1,3,3-tetramethyl uronium hexafluorophosphate(HBTU) was from Richelieu Biotechnologies (St. Hyacinth, Quebec,Canada). Acetonitrile was from BDH Laboratory Supplies (Poole, U.K.),Diethyl ether from Fluka Biochemicals (Melbourne) and 2-mercaptoethanolfrom Sigma (St. Louis Mo., USA). Trifluoroethanol from (Aldrich.Milwaukee, Wis., USA). HF was purchased from Boc Gases (Brisbane,Australia) The following Naα-Boc protected L-amino acids Ala, Gly, Ile,Leu, Phe, Pro, Val, Arg (Tos), Asp (OChx), Asn (Xanth), Glu (OChx), His(DNP), Ser (Bzl), Thr (Bzl), Tyr (2BrZ) were purchased either fromNovaBiochem (La Jolla, Calif., USA) or Bachem (Switzerland). MBHApolystyrene resin was purchased from Peptide Institute (Kyoto, Japan).

HPLC Methods

Analytical and preparative HPLC was carried out using a Waters HPLCsystem comprised of model 600 solvent delivery system 600E controllerand model 484 detector. Vydac C18 and C4 columns analytical (4.6×250 mmid) at a flow rate of 1 ml/min and semi preparative (10×250 mm id) at aflow rate of 3 ml/min and preparative (22×250 mm id) at a flow rate of 8ml/min were used. All peptides were purified using linear gradients of0.1% aqueous TFA (solvent A), 90% aqueous to acetonitrile 0.09% TFA(solvent B)

Peptide Synthesis

Peptides were synthesized using the rapid manual HBTU in-situneutralization synthesis techniques (Schnolzer et al., 1992, supra) on amodified ABI 430A peptide synthesizer (Alewood et al., 1997, supra). Thepeptide was synthesized on a MBHA resin on a 0.2 mmol scale using 0.79mmol/g NH2 substituted resin. Each amino acid was double coupled using 2mmol AA 0.48M HBTU (4 ml) and 1 ml DIEA for 10 min each coupling. TheBoc group was removed by 2×1 min treatments of TFA with 1 min DMF flowwashes of the resin.

At the completion of the synthesis the His(DNP) group, if present in aparticular sequence, was removed using 20% mercaptoethanol in 10%DIEA/DMF solution 3×30 min treatments. Peptide resin was cleaved usingHF with p-cresol and p-thiocresol (90:8:2) as scavangers at −5 to 0° C.for 2 hrs. If Trp(CHO) is present in a sequence, it is removed bytreatment with ethanolamine. The HF was removed in vacuo, the peptidestriturated with cold diethyl ether (3×50 ml) the precipitated peptidecollected then dissolved in 50% acetonitrile with 0.1% TFA to give thecrude peptide. The crude peptide (˜80 mg lots) was purified by RPHPLCand fractions collected and analysed by analytical RPHPLC and ESMS.Fractions containing the purified peptide were combined and lyophilised.

Mass spectral data were collected using a Perkin Elmer Sciex (Toronto,Canada) API III Biomolecular Mass Analyzer ion-spray mass spectrometerequipped with an ABI 140B solvent delivery system. Raw data was analyzedusing the program MassSpec (Perkin Elmer Sciex). Calculated masses wereobtained using the program MacProMass (Sunil Vemuri & Terry Lee, City ofHope, Durate, Calif.).

Ultraviolet Circular Dichroism (CD)

Far UV-CD spectra were recorded using a Jasco 710 CD spectrometer withassociated based PC software. CD spectra are presented as a plot of meanmolar ellipticity per residue [θ] deg cm² dmol⁻¹ verse wavelength in 0.1nm increments. The digitised data was ploted using the Kalidagraphprogram on a Macintosh. All peptide concentrations were determined byquantitative amino acid analysis.

¹H NMR Spectroscopy

All NMR experiments were recorded on a Bruker ARX 500 spectrometerequipped with a Z-gradient unit. Peptide concentration was approximately3 mM in 95% H₂O/5% D₂0 (T=293K). Spectra recorded included NOESY (Kumaret al., 1980, Biochem. Biophys. Res. Comm. 95 1; Jeener et al., 1979, 714546) with a mixing time of 400 millisecond, and TOCSY (Bax & Davis,1985, 65 355) with a mixing time of 85 millisecond. Spectra were runover 5550 Hz with 4K data points, 512 FIDs, 32-64 scans and a recycledelay of 1s. The solvent was suppressed using the WATERGATE sequence(Piotto et al., J. Biomol. NMR, 1992, 2 661) Spectra were processedusing UXNMR. FIDS were multiplied by a polynomial function and apodisedusing a 90° shifted sine-bell function in both dimensions prior toFourier transformation. Baseline correction using a 5^(th) orderpolynomial was applied and chemical shift values were referencedexternally to DSS at 0.00 ppm. The random coil H chemical shift valuesof Wishart et al., 1995, J. Biomol. NMR 6 135, were used. Spectra wereassigned using the methods of Wüthrich et al., 1986, NMR of Proteins andNucleic Acids. Wiley-Interscience NY.

Growth Hormone Proliferation Assay

BaF-B03 cells (a pro B cell line) that stably express the human GrowthHormone Receptor (hGHR) are used in this assay since they are able toelicit a GH-specific response at concentrations as low as 0.1 ng/mL hGH(4.54 pM). These cells also endogenously express the IL3 receptor andrequire IL3 or GM-CSF to survive in culture. The assay is based on thatof Mossman, 1983, J. Immunol. Meth. 65 55, and involves the followingprocedure:

-   (i) culture cells in RPMI-1640 medium supplemented with 10% (v/v)    foetal bovine serum (FBS) and 100 units/mL IL3 under 5% CO₂ at    37° C. Allow the culture to reach mid-log growth phase;-   (ii) centrifuge cells at 500×g and wash with PBS to remove IL3 from    the culture medium. Repeat the centrifugation and resuspend in 1 mL    of RPMI-1640 plus 0.5% (v/v) FBS. Count cells and dilute to a    concentration of 8×10⁵ cells/mL in same media;-   (iii) from a constantly stirred suspension, add 50 μL of cells to    each well of two 96 well plates;-   (iv) prepare stock solutions of the mimetic to be tested at various    concentrations such that the final concentration ranges from 100 nM    to 100 μM made up in 0.5% FBS media (final volume is 150 μL,    therefore stocks should be 3 times final concentration required).    Add 50 μL of these solutions to cells in sextuplicate (i.e. A1 to A6    are identical etc.);-   (v) prepare a stock solution (3 times) of hGH such that the final    concentration is 0.5 ng/mL and add 50 mL to each well of one plate.    Include one row as a negative control with no cytokine;-   (vi) prepare a stock solution (3 times) of IL-3 such that the final    concentration is 50 units/mL and add 50 μL to each well of the other    plate. Include one row as a negative control with no cytokine;-   (vii) incubate plates with no lids (to prevent uneven evaporation    rates) in a vented humidified box under the abovementioned    incubation conditions. Allow incubation to continue for 24 hrs;-   (viii) add 50 μL of 4 mg/mL MTT    (3-[4,5-dimethylthiazol-2-yl]-2,5-diphenyltetrazolium bromide) to    each well and incubate for a further 3 hrs;-   (ix) to stop assay, remove from incubator and lyse cells by adding    120 μL of isopropanol and triturating for several seconds per well    or until cells are clearly lysed. Allow plate to rest in the dark    for 5 minutes before reading;-   (x) read plate at 595 nm on a microplate reader. Values obtained are    directly proportional to cell number (as measured by mitochondrial    dehydrogenase levels).

CONCLUSIONS

These studies have shown that by engineering small, cysteine-richproteins, a stable mimetic with high bioavailability can be made withdesired biological characteristics, in this case the ability toantagonize the biological action of hGH. Furthermore, the databasesearching strategy of the present invention has shown that suitable“frameworks” for engineering mimetics can be identified according toaspects of structure which are shared with a sample protein thatpossesses a function of interest. The framework so identified willadvantageously have increased stability compared to the sample protein.Finally, frameworks identified by the method of the invention may besuitable for further amino acid sequence modification so as to impart afunction of the sample protein, or a function antagonistic thereto.

The present invention therefore provides a new strategy for theengineering of proteins, which strategy is particularly applicable tothe engineering of mimetics which may constitute the next generation oftherapeutics.

It will be understood by the skilled person that the invention is notlimited to the particular embodiments described in detail herein, butalso includes other embodiments consistent with the broad spririt andscope of the invention. TABLE 1 An example of a query file which definethe query Cα-Cβ vectors, the tolerance for each query atom and thedefinition of subset ATOM 344 Cα LYS A 41 54.743 11.420 29.859 0.5033.97 ATOM 347 Cβ LYS A 41 53.280 11.410 30.298 0.50 36.33 ATOM 382 CαLEU A 45 58.116 17.055 29.052 0.50 30.56 ATOM 385 Cβ LEU A 45 56.87017.340 29.906 0.50 27.80 ATOM 274 Cα GLU A 119 43.893 28.064 0.887 0.500.00 ATOM 277 Cβ GLU A 119 43.099 27.286 −0.137 0.50 0.00 ATOM 296 CαTHR A 123 41.789 33.792 1.008 0.50 0.00 ATOM 299 Cβ THR A 123 40.58632.811 0.784 0.50 0.00Subset 1, 3, 4.Subset 2.

TABLE 2 Blood serum stability test results 0 hr 1 hr 24 hrs Controlpeptide partially digested after 3 mins fully digested SCY01 stablestable stable

TABLE 3 Enzyme stability test results Control peptide SCY01 trypsinDigested in 1 hr Stable over 18 hrs α-chymotrypsin Digested in 1 hrStable over 18 hrs pepsin Digested in 1 hr Stable over 18 hrs

SCHEME D 1) Framework hit mutated to sample molecule residues: Using theBiopolymer module of InsightII, the residues matching the samplemolecule are mutated to the residue type of the sample molecule. Usingthe Search_compare module of InsightII, the sidechains of the mutatedresidues of the framework hit are flexibly fitted to the sidechains oftheir corresponding sample molecule residues to produce a theoretical‘Bioactive conformation’. A bump-check with the receptor is performed toidentify unmutated sidechain steric clashes with receptor. Investigatemimicking unmatched functional residues of sample molecule withunnatural amino acids.

2) Conformational stability of theoretical ‘bioactive conformation’: The‘bioactive conformation’ from above is minimised in a forcefield. Thermsd of the backbone atoms of the minimised mutated framework hit fromthe minimised unmutated framework hit is calculated. If the rmsd <2.0 A,the conformation is considered accessible.

3) Stability of fold: The mutated framework hit and the native frameworkhit are subject to molecular dynamics at 300 K. The rmsd of minimisedtrajectory intermediates from the original conformer in each dynamicsrun are plotted. Unless there is a significantly greater drift in rmsdof the mutated relative to native framework hit, the fold is consideredstable.

4) Electrostatic similarity to target: Electostatic isocontour surfacesare generated for both mutated framework hit and sample molecule andcompared for similarity. Electrostatic fields are mapped onto solventaccessible surface of mutated framework hit and sample molecule tocompare electrostatic properties at the contact surface

5) SynthesiseRectify (a) If there are steric clashes, consider the most conservedmutation that removes the bump.Rectify (b) If the theoretical ‘Bioactive conformation’ is not stable,check the Ramachandran plot of the hit for residues in disallowedregions. Consider stabilizing structure with unnatural amino acids egalpha-amino isobutyric acid in alpha helix motifs.

1-28. (canceled)
 29. An engineered protein comprising an amino acidsequence set forth in SEQ ID NO: 1 and at least two amino acid residuesof another protein which are non-contiguous in primary sequence andrepresent at least a portion of a functional region of said anotherprotein.
 30. The engineered protein of claim 29, wherein said anotherprotein is a cytokine.
 31. The engineered protein of claim 29, whereinsaid another protein is a cytokine receptor.
 32. The engineered proteinof claim 31, wherein the functional region of said cytokine receptor isa cytokine binding region.
 33. The engineered protein of claim 30,wherein the cytokine is selected from the group consisting of GH, IL-4,IL-6 and G-CSF.
 34. The engineered protein of claim 33, said engineeredprotein comprising an amino acid sequence selected from the groupconsisting of the amino acid sequences set forth in SEQ ID NOS:2-5. 35.The engineered protein of claim 29, which protein has greater stabilitythan said another protein.
 36. The engineered protein of claim 35, whichprotein exhibits a function either similar to, or inhibitory of, saidanother protein.
 37. The engineered protein of claim 36, whichengineered protein is a cytokine mimetic.